A Bounded Index for Cluster Validity
نویسندگان
چکیده
Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and real-life data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multidimensional data sets and is able to accommodate unique and sub-cluster cases.
منابع مشابه
A comprehensive validity index for clustering
Cluster validity indices are used for both estimating the quality of a clustering algorithm and for determining the correct number of clusters in data. Even though several indices exist in the literature, most of them are only relevant for data sets that contain at least two clusters. This paper introduces a new bounded index for cluster validity called the score function (SF), a double exponen...
متن کاملSum-of-Squares Based Cluster Validity Index and Significance Analysis
Different clustering algorithms achieve different results to certain data sets because most clustering algorithms are sensitive to the input parameters and the structure of data sets. Cluster validity, as the way of evaluating the result of the clustering algorithms, is one of the problems in cluster analysis. In this paper, we build up a framework for cluster validity process, meanwhile a sum-...
متن کاملA cluster validity index for fuzzy clustering
Cluster validity indexes have been used to evaluate the fitness of partitions produced by clustering algorithms. This paper presents a new validity index for fuzzy clustering called a partition coefficient and exponential separation (PCAES) index. It uses the factors from a normalized partition coefficient and an exponential separation measure for each cluster and then pools these two factors t...
متن کاملDevelopment of An External Cluster Validity Index using Probabilistic Approach and Min-max Distance
Validating a given clustering result is a very challenging task in real world. So for this purpose, several cluster validity indices have been developed in the literature. Cluster validity indices are divided into two main categories: external and internal. External cluster validity indices rely on some supervised information available and internal validity indices utilize the intrinsic structu...
متن کاملA Novel Validity Index for Determination of the Optimal Number of Clusters
The structural characteristics of clusters are investigated in the partitioning process. Two partition functions, which show opposite properties around the optimal cluster number, are found and a new cluster validity index is presented based on the combination of these functions. Some properties of the index function are discussed and numerical examples are presented. key words: clustering, val...
متن کامل